Inter-element dependency models for sequence classification

نویسندگان

  • Adrian Silvescu
  • Carson Andorf
  • Drena Dobbs
  • Vasant Honavar
چکیده

Naive Bayes is a fast to train model for sequence classification. We develop and experiment with two methods that are equally fast (they require only one pass through the training data), but they are also able to models interactions among close neighbours in the sequence (unlike the Naive bayes independence assumption). The first method basically runs Naive Bayes on overlapping k-grams obtained from the sequence. The second method, called NB(k), constructs a classifier associated with an undirected graphical model over the sequence that takes into account k-wise dependencies. We test our algorithms on protein function classification tasks based on functional families from the Gene Ontology (GO) database. Our results show significant improvements of the proposed methods over Naive Bayes with NB(k) leading over NB k-grams in all test cases. The two proposed methods despite having improved modelling accuracy and demonstrated improved test accuracy maintain the “one pass through the data only” training property of Naive Bayes thus yielding fairly accurate and efficient classifiers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Intrusion Trace Classification using Inter-element Dependency Models with k-Truncated Generalized Suffix Tree

We present a scalable and accurate method for classifying program traces to detect system intrusion attempts. By employing inter-element dependency models to overcome the independence violation problem inherent in the Naïve Bayes learners, our method yields intrusion detectors with better accuracy. For efficient counting of n-gram features without losing accuracy, we use a k-truncated generaliz...

متن کامل

A hybrid approach for database intrusion detection at transaction and inter-transaction levels

Nowadays, information plays an important role in organizations. Sensitive information is often stored in databases. Traditional mechanisms such as encryption, access control, and authentication cannot provide a high level of confidence. Therefore, the existence of Intrusion Detection Systems in databases is necessary. In this paper, we propose an intrusion detection system for detecting attacks...

متن کامل

Porosity classification from thin sections using image analysis and neural networks including shallow and deep learning in Jahrum formation

The porosity within a reservoir rock is a basic parameter for the reservoir characterization. The present paper introduces two intelligent models for identification of the porosity types using image analysis. For this aim, firstly, thirteen geometrical parameters of pores of each image were extracted using the image analysis techniques. The extracted features and their corresponding pore types ...

متن کامل

Compositional Sentence Representation from Character Within Large Context Text

In this work, we targeted two problems of representing a sentence on the basis of a constituent word sequence: a data-sparsity problem in non-compositional word embedding, and no usage of inter-sentence dependency. To improve these two problems, we propose a Hierarchical Composition Recurrent Network (HCRN), which consists of a hierarchy with 3 levels of compositional models: character, word an...

متن کامل

بررسی عددی رفتار ورق‌های کامپوزیتی چندلایه فلز- الیاف تحت ضربه کم سرعت

In this article, low velocity impact behavior of fiber metal laminate (FML) composite plates is investigated under three different impact energies (12.7 J, 16.3 J and 24.2 J). Here, three modeling techniques are used. In one of the models the inter-laminar damage is neglected (model without delamination) and in other two models this damage is simulated using cohesive element and cohesive surfac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004